Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score

نویسندگان

  • John M. Conroy
  • Judith D. Schlesinger
  • Dianne P. O'Leary
چکیده

We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an “oracle” score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with ROUGE. In addition, we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 Document Understanding Conference (DUC) evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs)....

متن کامل

Improving summarization performance by sentence compression: a pilot study

In this paper we study the effectiveness of applying sentence compression on an extraction based multi-document summarization system. Our results show that pure syntactic-based compression does not improve system performance. Topic signature-based reranking of compressed sentences does not help much either. However reranking using an oracle showed a significant improvement remains possible.

متن کامل

Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning

Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...

متن کامل

Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion

In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplificat...

متن کامل

Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships

Automatic multi-document summarization (MDS) is the process of extracting the most important information such as events and entities from multiple natural language texts focused on the same topic. We extract all types of semantic atomic information and feed them to a topic model to experiment with their effects on a summary. We design a coherent summarization system by taking into account the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006